Detecting the impact of sequencing errors on SAGE data

نویسندگان

  • Jacques Colinge
  • Georg Feger
چکیده

UNLABELLED SAGE data are obtained by sequencing short DNA tags. Due to the mistakes in DNA sequencing, SAGE data contain errors. We propose a new approach to identify tags whose abundance is biased by sequencing errors. This approach is based on a concept of neighbourhood: abundant tags can contaminate tags whose sequence is very close. The application of our approach reveals that moderately abundant tags can be generated by sequencing errors uniquely. It also allows for detecting correct rare tags. AVAILABILITY Software is available only to non-profit entities and for non-commercial purposes upon request.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Correction of sequence-based artifacts in serial analysis of gene expression

MOTIVATION Serial Analysis of Gene Expression (SAGE) is a powerful technology for measuring global gene expression, through rapid generation of large numbers of transcript tags. Beyond their intrinsic value in differential gene expression analysis, SAGE tag collections afford abundant information on the size and shape of the sample transcriptome and can accelerate novel gene discovery. These la...

متن کامل

Statistical modeling of sequencing errors in SAGE libraries.

MOTIVATION Sequencing errors may bias the gene expression measurements made by Serial Analysis of Gene Expression (SAGE). They may introduce non-existent tags at low abundance and decrease the real abundance of other tags. These effects are increased in the longer tags generated in LongSAGE libraries. Current sequencing technology generates quite accurate estimates of sequencing error rates. He...

متن کامل

Serial Analysis of Gene Expression (SAGE) - Sequencing Errors

Serial Analysis of Gene Expression (SAGE) is a technique to study overall gene expression in different (normal or disease) tissues. Results take a form of a so-called SAGE library for each of the tissues studied. A SAGE library is a set of text-strings (typically 10base-pairs long), called tags. A tag is representative for a gene that is active in a particular cell or tissue. From a statistical...

متن کامل

An approach to fault detection and correction in design of systems using of Turbo ‎codes‎

We present an approach to design of fault tolerant computing systems. In this paper, a technique is employed that enable the combination of several codes, in order to obtain flexibility in the design of error correcting codes. Code combining techniques are very effective, which one of these codes are turbo codes. The Algorithm-based fault tolerance techniques that to detect errors rely on the c...

متن کامل

Impact of Measuring Devices and Data Analysis on the Determination of Gas Membrane Properties

The time-lag method, using a gas permeation experiment, is currently the most popular method for determining the membrane properties: diffusivity coefcient and permeability coefcient, and from which the solubility coefcient can be calculated. In this investigation, the impact of systematic, random (noise), resolution and extrapolation errors associated with gas permeatio...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Bioinformatics

دوره 17 9  شماره 

صفحات  -

تاریخ انتشار 2001